Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure published messages are acknowledged for play mode #951

Merged
merged 4 commits into from
Feb 1, 2022

Conversation

Barry-Xu-2018
Copy link
Contributor

@Barry-Xu-2018 Barry-Xu-2018 commented Jan 27, 2022

Address #571

After test with the latest code for rolling (use fastdds), #571 (follow steps to reproduce in #571) cannot reproduce.
Note that synchronous is set as default for fastdds recently(ros2/rmw_fastrtps@e7c749a). But even if I use async mode, #571 also cannot reproduce.

Signed-off-by: Barry Xu <barry.xu@sony.com>
@Barry-Xu-2018 Barry-Xu-2018 requested a review from a team as a code owner January 27, 2022 03:31
@Barry-Xu-2018 Barry-Xu-2018 requested review from emersonknapp and jhdcs and removed request for a team January 27, 2022 03:31
@Barry-Xu-2018
Copy link
Contributor Author

The failure on ament_uncrustify is related to file mock_storage.hpp, which isn't changed in this PR.

--- rosbag2_compression/test/rosbag2_compression/mock_storage.hpp
+++ rosbag2_compression/test/rosbag2_compression/mock_storage.hpp.uncrustify
@@ -43 +43 @@
-    void(const std::vector<std::shared_ptr<const rosbag2_storage::SerializedBagMessage>> &));
+    void(const std::vector<std::shared_ptr<const rosbag2_storage::SerializedBagMessage>>&));

Code style divergence in file 'rosbag2_cpp/test/rosbag2_cpp/mock_storage.hpp':

--- rosbag2_cpp/test/rosbag2_cpp/mock_storage.hpp
+++ rosbag2_cpp/test/rosbag2_cpp/mock_storage.hpp.uncrustify
@@ -44 +44 @@
-    void(const std::vector<std::shared_ptr<const rosbag2_storage::SerializedBagMessage>> &));
+    void(const std::vector<std::shared_ptr<const rosbag2_storage::SerializedBagMessage>>&));

2 files with code style divergence
Error: The process '/usr/bin/bash' failed with exit code 1

@Barry-Xu-2018
Copy link
Contributor Author

@fujitatomoya @emersonknapp
Please help to review.

@fujitatomoya
Copy link
Contributor

@Barry-Xu-2018 will do in a couple of days.

Copy link
Contributor

@jhdcs jhdcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I only saw a couple comments that might use a bit of tweaking, though I don't think it's enough to hold up the review over.

Since you requested @fujitatomoya and @emersonknapp as reviewers, specifically, I'll wait on their judgement rather than attempt merging for now.

@@ -229,6 +229,30 @@ void Player::play()
std::lock_guard<std::mutex> lk(ready_to_play_from_queue_mutex_);
is_ready_to_play_from_queue_ = false;
ready_to_play_from_queue_cv_.notify_all();

// Wait for all published messages are acknowledged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Wait for all published messages are acknowledged
// Wait for all published messages to be acknowledged

Comment on lines 66 to 67
// Timeout for waiting all published messages acknowledged
// negative means published messages need not to be acknowledged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Timeout for waiting all published messages acknowledged
// negative means published messages need not to be acknowledged
// Timeout for waiting for all published messages to be acknowledged.
// Negative values means that published messages do not need to be acknowledged.

@fujitatomoya
Copy link
Contributor

@jhdcs since i do not have any authorization on this repository, you can just go ahead instead of my comments if you are good to go. (i will do the review as community reviewer 😃 )

Comment on lines 39 to 45
def not_negative_int(arg: str) -> int:
value = int(arg)
if value < 0:
raise ArgumentTypeError(f'Value {value} is less than zero.')
return value


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dont we have this along with check_positive_float as common function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding, the input parameter of check function is fixed (only arg: str, and this parameter is passed by argparse module).
We cannot use this parameter to specify what to do, such as check positive or not_negative.
So I have no idea how to use one common function to implement check_positive_float and not_negative_int.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was not clear on this, i was just saying to move this function where check_positive_float is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just saying to move this function where check_positive_float is.

Get it.

Comment on lines 513 to 516
RCLCPP_WARN(
get_logger(),
"--wait-for-all-acked is invaild for topic '%s' since reliability of QOS is BestEffort.",
topic.name.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 100 of 200 topics come true with this if statement, there will be 100 warning lines here. which btw, i think it is likely to happen. do we need to print this warning for each topic? how about printing warning once that --wait-for-all-acked is invalid for topics with reliability of QOS is BestEffort.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 100 of 200 topics come true with this if statement, there will be 100 warning lines here.

Yes. It is poor user experience.
But I think topic information is useful for user.
How about change as below: (only print one warning for all invalid topics)

-wait-for-all-acked is invalid for below topics with reliability of QOS is BestEffort
xxx, xxx, xxx, xxx, xxx 

@@ -93,6 +101,17 @@ def add_arguments(self, parser, cli_name): # noqa: D102
parser.add_argument(
'--start-offset', type=check_positive_float, default=0.0,
help='Start the playback player this many seconds into the bag file.')
parser.add_argument(
'--wait-for-all-acked', type=not_negative_int, default=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default behavior would be okay with wait forever to avoid possible problem to miss the last message? we would like to have consensus on this. CC: @emersonknapp @jhdcs
IMO, we would want to have specific timeout like 1 sec instead of wait forever, because if anything happens, wait forever will be not good for user experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.
I think this option should be disabled by default, since most cases don't need this feature.
What do you think ?

if (!pub.second->wait_for_all_acked(timeout)) {
RCLCPP_ERROR(
get_logger(),
"Failed to wait all published messages acknowledged for topic %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this happens only when timeout, so how about the following? I think timeout is different from failure or exception.

Suggested change
"Failed to wait all published messages acknowledged for topic %s",
"Timeout to wait all published messages acknowledged for topic %s",

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Copy link
Contributor

@jhdcs jhdcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another review pass, since several changes were made. Again, just had some comment/documentation suggestions.

'--wait-for-all-acked', type=check_not_negative_int, default=-1,
help='Wait until all published messages are acknowledged by all subscribers or until '
'the timeout elapses in millisecond before play is terminated. '
'Especially for the case of sending message with big size in a short time. '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this flows a little better, though I tend to be a bit wordy.

Suggested change
'Especially for the case of sending message with big size in a short time. '
'Especially useful when sending many messages with large sizes in a short timeframe.'

if (!pub.second->wait_for_all_acked(timeout)) {
RCLCPP_ERROR(
get_logger(),
"Timeout to wait all published messages acknowledged for topic %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this reads a little better.

Suggested change
"Timeout to wait all published messages acknowledged for topic %s",
"Timed out while waiting for all published messages to be acknowledged for topic %s",

} catch (std::exception & e) {
RCLCPP_ERROR(
get_logger(),
"Failed to wait all published messages acknowledged for topic %s : %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small wordage suggestion.

Suggested change
"Failed to wait all published messages acknowledged for topic %s : %s",
"Exception occurred while waiting for all published messages to be acknowledged for topic %s : %s",


RCLCPP_WARN(
get_logger(),
"--wait-for-all-acked is invaild for below topics since reliability of QOS is BestEffort.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor change here.

Suggested change
"--wait-for-all-acked is invaild for below topics since reliability of QOS is BestEffort.\n"
"--wait-for-all-acked is invaild for the below topics since reliability of QOS is BestEffort.\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments.
Please help to check again @jhdcs

Signed-off-by: Barry Xu <barry.xu@sony.com>
Copy link
Contributor

@jhdcs jhdcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jhdcs jhdcs merged commit 9447f4e into ros2:master Feb 1, 2022
@clalancette
Copy link
Contributor

Looks good to me!

Did CI from https://ci.ros2.org get run on this PR?

@jhdcs
Copy link
Contributor

jhdcs commented Feb 1, 2022

Oh crud, I thought it had!

Crud, let me see how to undo that...

@jhdcs
Copy link
Contributor

jhdcs commented Feb 1, 2022

Not sure I did that right...

Sorry, first time merging a PR...

@clalancette
Copy link
Contributor

Here's CI after the merge:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Contributor

@clalancette thanks for checking on this one. after all, everything looks okay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants